Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications

نویسندگان

  • Ming-Jie Zhao
  • Narayanan Unny Edakunni
  • Adam Craig Pocock
  • Gavin Brown
چکیده

Fano’s inequality lower bounds the probability of transmission error through a communication channel. Applied to classification problems, it provides a lower bound on the Bayes error rate and motivates the widely used Infomax principle. In modern machine learning, we are often interested in more than just the error rate. In medical diagnosis, different errors incur different cost; hence, the overall risk is cost-sensitive. Two other popular criteria are balanced error rate (BER) and F-score. In this work, we focus on the two-class problem and use a general definition of conditional entropy (including Shannon’s as a special case) to derive upper/lower bounds on the optimal F-score, BER and cost-sensitive risk, extending Fano’s result. As a consequence, we show that Infomax is not suitable for optimizing F-score or cost-sensitive risk, in that it can potentially lead to low F-score and high risk. For cost-sensitive risk, we propose a new conditional entropy formulation which avoids this inconsistency. In addition, we consider the common practice of using a threshold on the posterior probability to tune performance of a classifier. As is widely known, a threshold of 0.5, where the posteriors cross, minimizes error rate—we derive similar optimal thresholds for F-score and BER.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Runway Landing Flow and Capacity with Risk and Cost Benefit Factors

As the demand for the civil aviation has been growing for decades and the system becoming increasingly complex, the use of systems engineering and operations research tools have shown to be of further use in managing this system. In this study, we apply such tools in managing landing operations on runways (as the bottleneck and highly valuable resources of air transportation networks) to handle...

متن کامل

On Bayes Risk Lower Bounds

This paper provides a general technique for lower bounding the Bayes risk of statistical estimation, applicable to arbitrary loss functions and arbitrary prior distributions. A lower bound on the Bayes risk not only serves as a lower bound on the minimax risk, but also characterizes the fundamental limit of any estimator given the prior knowledge. Our bounds are based on the notion of f -inform...

متن کامل

Demonstrating Continuous Variable EPR Steering in spite of Finite Experimental Capabilities using Fano Steering Bounds

We show how one can demonstrate continuous-variable EPR-steering without needing to characterize entire measurement probability distributions. To do this, we develop a modified Fano inequality useful for discrete measurements of continuous variables, and use it to bound the conditional uncertainties in continuous-variable entropic EPR-steering inequalities. With these bounds, we show how one ca...

متن کامل

Distance-based and continuum Fano inequalities with applications to statistical estimation

In this technical note, we give two extensions of the classical Fano inequality in information theory. The first extends Fano’s inequality to the setting of estimation, providing lower bounds on the probability that an estimator of a discrete quantity is within some distance t of the quantity. The second inequality extends our bound to a continuum setting and provides a volume-based bound. We i...

متن کامل

Cost Stickiness: Value Creating or Value Destroying (The Iranian Experience)

This research reviews and tests two contradicting notions in cost stickiness literature by empirical recognition of the consequences of cost stickiness. Cost stickiness is consistent with both rational resource planning and opportunistic incentives of manager to increase personal benefits arising from status and power. Although both mechanisms involve asymmetric retention of slack, some of the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2013